Empirical Knowledge Representation Generation Using N-Gram Clustering
نویسنده
چکیده
System Overview The approach acquires a domain-specific semantic representation by carrying out stochastic analysis of a corpus. Sets of conceptually similar paragraphs are utilised. The corpus and semantic representation are used to generate schematic structures. These are used to concisely store the knowledge contained within existing texts. New texts are processed to dynamically update the knowledge base. Any novel concepts encountered are analysed and a new structure added to the representation. A more comprehensive explanation of this system and references to related work are presented in (Collier 1994).
منابع مشابه
Natural Language Generation for Text-to-Text Applications Using an Information-Slim Representation
I propose a representation formalism and algorithms to be used in a new language generation mechanism for text-to-text applications. The generation process is driven by both text-specific information encoded via probability distributions over words and phrases derived from the input text, and general language knowledge captured by n-gram and syntactic language models. A Text-to-Text Perspective...
متن کاملA Systematic Study on Document Representation and Dimensionality Reduction for Text Clustering A Systematic Study on Document Representation and Dimensionality Reduction for Text Clustering
Increasingly large text datasets and the high dimensionality associated with natural language is a great challenge of text mining. In this research, a systematic study is conducted of application of three Dimension Reduction Techniques (DRT) on three different document representation methods in the context of the text clustering problem using several standard benchmark datasets. The dimensional...
متن کاملAssessing Two-Mode Semantic Network Story Representations Using a False Memory Paradigm
This paper describes a novel method of representing semantic networks of stories (and other text) as a two-mode graph. This method has some advantages over traditional one-mode semantic networks, but has the potential drawback (shared with n-gram text networks) that it contains paths that are not present in the text. An empirical study was devised using a false memory paradigm to determine whet...
متن کاملخوشهبندی اسناد مبتنی بر آنتولوژی و رویکرد فازی
Data mining, also known as knowledge discovery in database, is the process to discover unknown knowledge from a large amount of data. Text mining is to apply data mining techniques to extract knowledge from unstructured text. Text clustering is one of important techniques of text mining, which is the unsupervised classification of similar documents into different groups. The most important step...
متن کاملEvaluation and Comparison of Concept Based and N-Grams Based Text Clustering Using SOM
With the great and rapidly growing number of documents available in digital form (Internet, library, CD-Rom...), the automatic classification of texts has become a significant research field and a fundamental task in document processing. This paper deals with unsupervised classification of textual documents also called text clustering using Self-Organizing Maps of Kohonen in two new situations:...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 1994